poisoning rate
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
Towards Stable Backdoor Purification through Feature Shift Tuning
It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low poisoning rate, the entanglement between backdoor and clean features undermines the effect of tuning-based defenses. Therefore, it is necessary to disentangle the backdoor and clean features in order to improve backdoor purification.
Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization
Yang, Guangmingmei, Miller, David J., Kesidis, George
Most post-training backdoor detection methods rely on attacked models exhibiting extreme outlier detection statistics for the target class of an attack, compared to non-target classes. However, these approaches may fail: (1) when some (non-target) classes are easily discriminable from all others, in which case they may naturally achieve extreme detection statistics (e.g., decision confidence); and (2) when the backdoor is subtle, i.e., with its features weak relative to intrinsic class-discriminative features. A key observation is that the backdoor target class has contributions to its detection statistic from both the backdoor trigger and from its intrinsic features, whereas non-target classes only have contributions from their intrinsic features. To achieve more sensitive detectors, we thus propose to suppress intrinsic features while optimizing the detection statistic for a given class. For non-target classes, such suppression will drastically reduce the achievable statistic, whereas for the target class the (significant) contribution from the backdoor trigger remains. In practice, we formulate a constrained optimization problem, leveraging a small set of clean examples from a given class, and optimizing the detection statistic while orthogonalizing with respect to the class's intrinsic features. We dub this plug-and-play approach Class Subspace Orthogonalization (CSO) and assess it against challenging mixed-label and adaptive attacks.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Dataset Poisoning Attacks on Behavioral Cloning Policies
Kalra, Akansha, Datta, Soumil, Gilmore, Ethan, La, Duc, Tao, Guanhong, Brown, Daniel S.
Behavior Cloning (BC) is a popular framework for training sequential decision policies from expert demonstrations via supervised learning. As these policies are increasingly being deployed in the real world, their robustness and potential vulnerabilities are an important concern. In this work, we perform the first analysis of the efficacy of clean-label backdoor attacks on BC policies. Our backdoor attacks poison a dataset of demonstrations by injecting a visual trigger to create a spurious correlation that can be exploited at test time. We evaluate how policy vulnerability scales with the fraction of poisoned data, the strength of the trigger, and the trigger type. We also introduce a novel entropy-based test-time trigger attack that substantially degrades policy performance by identifying critical states where test-time triggering of the backdoor is expected to be most effective at degrading performance. We empirically demonstrate that BC policies trained on even minimally poisoned datasets exhibit deceptively high, near-baseline task performance despite being highly vulnerable to backdoor trigger attacks during deployment. Our results underscore the urgent need for more research into the robustness of BC policies, particularly as large-scale datasets are increasingly used to train policies for real-world cyber-physical systems.
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.68)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (0.67)
Original BadNet Blended WaNet SIG SSBA LC
Deep Neural Networks (DNNs) are extensively applied in today's society especially for some safety-critical scenarios like autonomous driving and face verification. B.2 Attack Configurations We conducted all the experiments with 4 NVIDIA 3090 GPUs. LC, we adopt the pre-generated invisible trigger from BackdoorBench. The visualization of the backdoored images is shown in Figure 9 . FE-tuning and FT -init proposed in our paper.
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
He, Haorui, Li, Yupeng, Zhu, Bin Benjamin, Wen, Dacheng, Cheng, Reynold, Lau, Francis C. M.
State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.
- Oceania > New Zealand (0.14)
- Asia > China > Hong Kong (0.05)
- Government (1.00)
- Media > News (0.87)
- Information Technology > Security & Privacy (0.68)
Enhancing All-to-X Backdoor Attacks with Optimized Target Class Mapping
Wang, Lei, Tian, Yulong, Han, Hao, Xu, Fengyuan
Backdoor attacks pose severe threats to machine learning systems, prompting extensive research in this area. However, most existing work focuses on single-target All-to-One (A2O) attacks, overlooking the more complex All-to-X (A2X) attacks with multiple target classes, which are often assumed to have low attack success rates. In this paper, we first demonstrate that A2X attacks are robust against state-of-the-art defenses. We then propose a novel attack strategy that enhances the success rate of A2X attacks while maintaining robustness by optimizing grouping and target class assignment mechanisms. Our method improves the attack success rate by up to 28%, with average improvements of 6.7%, 16.4%, 14.1% on CIFAR10, CIFAR100, and Tiny-ImageNet, respectively. We anticipate that this study will raise awareness of A2X attacks and stimulate further research in this under-explored area.
fa0126bb7ebad258bf4ffdbbac2dd787-Supplemental-Conference.pdf
This document provides additional details, analysis, and experimental results. To evaluate our method, we use four datasets, MNIST, CIFAR10, GTSRB (German Traffic Sign Recognition Benchmark), and T -IMNET, to evaluate our method. Note that MNIST, CIFAR10, and GTSRB have been widely used in the literature of backdoor attacks on DNN. During the evaluation stage, no augmentation is applied. In the evaluation stage, no augmentation is used.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada > Ontario > Toronto (0.05)